本文介绍了Lingjing团队在NLPCC-2022-Shared-Task-4多模式对话理解和发电(MDUG)中的实验方案。MDUG任务可以分为两个阶段:多模式上下文理解和响应生成。为了充分利用视觉信息以获得场景的理解和对话的生成,我们提出了MDUG任务的场景感知提示。具体而言,我们利用多任务策略共同建模场景和会话多模式的理解。采用视觉标题来了解场景信息,而基于场景和会话感知标签的固定类型的模板提示则用于进一步改善对话生成性能。广泛的实验结果表明,与其他竞争方法相比,所提出的方法已经达到了最先进的(SOTA)性能,在此MDUG竞争中,我们在所有三个子任务中排名1-ST。
translated by 谷歌翻译
基于生成模型的图像无损压缩算法在改善压缩比方面取得了巨大的成功。但是,即使使用最先进的AI加速芯片,它们中大多数的吞吐量也小于1 Mb/s,从而阻止了它们的大多数现实应用应用,通常需要100 MB/s。在本文中,我们提出了PILC,这是一种端到端图像无损压缩框架,使用单个NVIDIA TESLA V100 GPU实现200 Mb/s的压缩和减压,比以前最有效的速度快10倍。为了获得此结果,我们首先开发了一个AI编解码器,该AI编解码器结合了自动回归模型和VQ-VAE,在轻质设置中性能很好,然后我们设计了一个低复杂性熵编码器,可与我们的编解码器配合使用。实验表明,在多个数据集中,我们的框架压缩比PNG高30%。我们认为,这是将AI压缩推向商业用途的重要步骤。
translated by 谷歌翻译
首字母缩略词歧义意味着从字典中找到一个暧昧的缩写的正确含义,该词典是科学文档理解的关键点之一(SDU @ Aaai-22)。最近,许多尝试通过微调预先训练的屏蔽语言模型(MLMS)来试图解决这个问题,以获得更好的缩写表示。然而,首字母缩写含义在不同的上下文中变化,其对应的句子表示是用整个表示空间的窄子集占据的各向异性分布。来自预先训练的MLM的这种表示不适合来自给定字典的缩写歧义。在本文中,我们提出了一个简单的框架,用于比较歧义(SIMCLAD)方法的对比学习,以更好地了解缩略语意义。具体而言,我们设计了一种新的持续对比预训练方法,通过学习首字母句话表现的各向同性和歧视性分布来增强预先训练的模型的泛化能力。结果对英语科学领域的缩写歧义表明,该方法优于所有其他竞争最先进的(SOTA)方法。
translated by 谷歌翻译
首字母缩略词提取旨在从文件中找到首字母缩略词(即,短文)及其含义(即,长形式),这对于科学文件理解(SDU @ Aaai-22)任务很重要。以前的作品致力于将此任务建模为段落级序列标记问题。但是,它缺乏有效利用外部知识,尤其是当数据集处于低资源设置时。最近,具有庞大培训的语言模型的基于及时的方法可以显着提高低资源下游任务的性能。在本文中,我们提出了一种用于缩写式提取任务的基于行的序列生成(PSG)方法。具体来说,我们设计一个模板,用于提示带有自动回归的提取的缩写文本。位置提取算法旨在提取所生成答案的位置。在低资源设置中越南语和波斯语的缩写提取的结果表明,所提出的方法优于所有其他竞争全能(SOTA)方法。
translated by 谷歌翻译
深度神经网络令人惊奇地遭受数据集偏见,这对模型鲁棒性,泛化和公平性有害。在这项工作中,我们提出了一个两级的脱扎方案,以防止顽固的未知偏差。通过分析有偏置模型的存在的因素,我们设计了一种小说学习目标,通过依赖单独的偏见,无法达到。具体而言,使用所提出的梯度对准(GA)实现了脱叠模型,该梯度对准(GA)动态地平衡了偏置对齐和偏见冲突的样本的贡献(在整个整个训练过程中,在整个训练过程中,强制执行模型以利用内部提示进行公平的决定。虽然在真实世界的情景中,潜在的偏差非常难以发现并对手动标记昂贵。我们进一步提出了通过对等挑选和培训集合来提出自动偏见冲突的样本挖掘方法,而无需先前了解偏见信息。各种数据中的多个数据集进行的实验表明了我们拟议计划的有效性和稳健性,该计划成功减轻了未知偏差的负面影响,实现了最先进的性能。
translated by 谷歌翻译
通常在具有固定预定义类别的完全注销的培训数据上学习对象探测器。但是,通常需要逐步增加类别。通常,在这种情况下,只有用旧课程注释的原始培训集和一些带有新课程的新培训数据。基于有限的数据集,强烈需要一个可以处理所有类别的统一检测器。我们提出了一个实用计划,以实现这项工作。无冲突的损失旨在避免标签歧义,从而在一次训练中导致可接受的探测器。为了进一步提高性能,我们提出了一个重新培训阶段,其中采用蒙特卡洛辍学术来计算定位置信度,以挖掘更准确的边界框,并提出了一种重叠的加权方法,以更好地利用在重新训练期间更好地利用伪注释。广泛的实验证明了我们方法的有效性。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译